Search CORE

37 research outputs found

Temporal Logic Monitoring Rewards via Transducers

Author: De Giacomo Giuseppe
Favorito Marco
Iocchi Luca
Patrizi Fabio
Ronca Alessandro
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2020
Field of study

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Exploiting Multiple Abstractions in Episodic RL via Reward Shaping

Author: Cipollone Roberto
De Giacomo Giuseppe
Favorito Marco
Iocchi Luca
Patrizi Fabio
Publication venue
Publication date: 26/06/2023
Field of study

One major limitation to the applicability of Reinforcement Learning (RL) to many practical domains is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose a novel form of Reward Shaping where the solution obtained at the abstract level is used to offer rewards to the more concrete MDP, in such a way that the abstract solution guides the learning in the more complex domain. In contrast with other works in Hierarchical RL, our technique has few requirements in the design of the abstract models and it is also tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain. Moreover, we prove that the method guarantees optimal convergence and we demonstrate its effectiveness experimentally.Comment: This is an extended version of the paper presented at AAAI 2023, https://doi.org/10.1609/aaai.v37i6.2588

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A PoW-less Bitcoin with Certified Byzantine Consensus

Author: Benedetti Marco
De Sclavis Francesco
Favorito Marco
Galano Giuseppe
Giammusso Sara
Muci Antonio
Nardelli Matteo
Publication venue
Publication date: 01/01/2022
Field of study

Distributed Ledger Technologies (DLTs), when managed by a few trusted validators, require most but not all of the machinery available in public DLTs. In this work, we explore one possible way to profit from this state of affairs. We devise a combination of a modified Practical Byzantine Fault Tolerant (PBFT) protocol and a revised Flexible Round-Optimized Schnorr Threshold Signatures (FROST) scheme, and then we inject the resulting proof-of-authority consensus algorithm into Bitcoin (chosen for the reliability, openness, and liveliness it brings in), replacing its PoW machinery. The combined protocol may operate as a modern, safe foundation for digital payment systems and Central Bank Digital Currencies (CBDC)

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Teor e composição química do óleo essencial e crescimento vegetativo de Aloysia triphylla em diferentes espaçamentos e épocas de colheita

Author: Adams RP
Adzet T
Bhering SB
Biasi LA
Brant RS
Brant RS
Czepak MP
Deschamps C
Favorito PA
Figueiredo RO de
Kiehl EJ
Lima HRP
Lorenzi H
Marco C
Melo MTP
Minami K
Monteiro R
Sangwan NS
Tabatavaie SJ
Taiz L
Taveira FSN
Ventrella MC
Yadava AK
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Standard Grammars for LTL and LDL

Author: Marco Favorito
Publication venue
Publication date: 01/01/2020
Field of study

The heterogeneity of tools that support temporal logic formulae poses several challenges in terms of interoperability. This document proposes standard grammars for Linear Temporal Logic (LTL) (Pnueli 1977) and Linear Dynamic Logic (Vardi 2011; De Giacomo and Vardi 2013)

Archivio della ricerca- Università di Roma La Sapienza

Compositional Approach to Translate LTLf/LDLf into Deterministic Finite Automata

Author: Giuseppe De Giacomo
Marco Favorito
Publication venue
Publication date: 01/01/2021
Field of study

The translation from temporal logics to automata is the workhorse algorithm of several techniques in computer science and AI, such as reactive synthesis, reasoning about actions, FOND planning with temporal specifications, and reinforcement learning with non-Markovian rewards, just to name a few. Unfortunately, the problem is computationally intractable, requiring the implementation of several heuris- tics to make it usable in practice. In this paper, following the recent interest in temporal logic formalisms over finite traces, we present a compositional approach for dealing with translations of Linear Temporal Logic and Linear Dynamic Logic (LDLf) on finite traces into Deterministic Finite Automata DFA.That is, we inductively transform each LTLf/LDLf subformula into a DFA, and combine them through automata operators. By relying on efficient semi-symbolic automata rep- resentations, we empirically show the effectiveness of our ap- proach and the competitiveness with similar tools. Moreover, this is the first work that provides a scalable and practical tool supporting the translation to DFA not only for LTLf but also for full LDLf

Archivio della ricerca- Università di Roma La Sapienza

Association for the Advancement of Artificial Intelligence: AAAI Publications

Planning for Temporally Extended Goals in Pure-Past Linear Temporal Logic: A Polynomial Reduction to Standard Planning

Author: Francesco Fuggitti
Giuseppe De Giacomo
Marco Favorito
Publication venue: arXiv
Publication date: 01/01/2022
Field of study

We study temporally extended goals expressed in Pure-Past LTL (PPLTL). PPLTL is particularly interesting for expressing goals since it allows to express sophisticated tasks as in the Formal Methods literature, while the worst-case computational complexity of Planning in both deterministic and nondeterministic domains (FOND) remains the same as for classical reachability goals. However, while the theory of planning for PPLTL goals is well understood, practical tools have not been specifically investigated. In this paper, we make a significant leap forward in the construction of actual tools to handle PPLTL goals. We devise a technique to polynomially translate planning for PPLTL goals into standard planning. We show the formal correctness of the translation, its complexity, and its practical effectiveness through some comparative experiments. As a result, our translation enables state-of-the-art tools, such as FD or MyND, to handle PPLTL goals seamlessly, maintaining the impressive performances they have for classical reachability goals

Archivio della ricerca- Università di Roma La Sapienza

A Practical Framework for General Dialogue-Based Bilateral Interactions

Author: Favorito Marco
Hosseini Seyed Ali
Minarsch David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

For autonomous agents and services to cooperate and interact in multi-agent environments they require well-defined protocols. A multitude of protocol languages for multi-agent systems have been proposed in the past, but they have mostly remained theoretical or have limited prototypical implementations. This work proposes a practical realisation of a general framework for defining dialogue-based bilateral interaction protocols which supports arbitrary agent-based interactions. Crucially, this work is tightly integrated with a modern framework for the creation of autonomous agents and multi-agent systems, making it possible to go from protocols’ specification to their implementation and usage by agents, and enables evaluation of protocols’ effectiveness and applicability in real-world use cases

Archivio della ricerca- Università di Roma La Sapienza

Restraining Bolts for Reinforcement Learning Agents

Author: Favorito Marco
Giacomo De
Iocchi Luca
Patrizi Fabio
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2020
Field of study

In this work, we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However, they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces LTLf/LDLf over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.

Archivio della ricerca- Università di Roma La Sapienza